{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "e794fdbe-e032-470d-8eda-584f0e4f3604",
   "metadata": {},
   "source": [
    "# Virtual Try-On with CatVTON and OpenVINO\n",
    "\n",
    "Virtual try-on methods based on diffusion models achieve realistic try-on effects but replicate the backbone network as a ReferenceNet or leverage additional image encoders to process condition inputs, resulting in high training and inference costs. [In this work](http://arxiv.org/abs/2407.15886), authors rethink the necessity of ReferenceNet and image encoders and innovate the interaction between garment and person, proposing CatVTON, a simple and efficient virtual try-on diffusion model.\n",
    "It facilitates the seamless transfer of in-shop or worn garments of arbitrary categories to target persons by simply\n",
    "concatenating them in spatial dimensions as inputs. The efficiency of the model is demonstrated in three aspects: \n",
    " 1. Lightweight network. Only the original diffusion modules are used, without additional network modules. The text encoder and cross attentions for text injection in the backbone are removed, further reducing the parameters by 167.02M.\n",
    " 2. Parameter-efficient training. We identified the try-on relevant modules through experiments and achieved high-quality try-on effects by training only 49.57M parameters (∼5.51% of the backbone network’s parameters). \n",
    " 3. Simplified inference. CatVTON eliminates all unnecessary conditions and preprocessing steps, including pose estimation, human parsing, and text input, requiring only garment reference, target person image, and mask for the virtual try-on process. Extensive experiments demonstrate that CatVTON achieves superior qualitative and quantitative results with fewer prerequisites and trainable parameters than baseline methods. Furthermore, CatVTON shows good generalization in in-the-wild scenarios despite using open-source datasets with only 73K samples.\n",
    "\n",
    "\n",
    "Teaser image from [CatVTON GitHub](https://github.com/Zheng-Chong/CatVTON)\n",
    "![teaser](https://github.com/Zheng-Chong/CatVTON/blob/edited/resource/img/teaser.jpg?raw=true)\n",
    "\n",
    "In this tutorial we consider how to convert and run this model using OpenVINO. An additional part demonstrates how to run optimization with [NNCF](https://github.com/openvinotoolkit/nncf/) to speed up pipeline.\n",
    "\n",
    "\n",
    "#### Table of contents:\n",
    "\n",
    "- [Prerequisites](#Prerequisites)\n",
    "- [Convert the model to OpenVINO IR](#Convert-the-model-to-OpenVINO-IR)\n",
    "- [Compiling models](#Compiling-models)\n",
    "- [Optimize model using NNCF Post-Training Quantization API](#Optimize-model-using-NNCF-Post-Training-Quantization-API)\n",
    "    - [Run Post-Training Quantization](#Run-Post-Training-Quantization)\n",
    "    - [Run Weights Compression](#Run-Weights-Compression)\n",
    "    - [Compare model file sizes](#Compare-model-file-sizes)\n",
    "- [Interactive demo](#Interactive-demo)\n",
    "\n",
    "\n",
    "### Installation Instructions\n",
    "\n",
    "This is a self-contained example that relies solely on its own code.\n",
    "\n",
    "We recommend  running the notebook in a virtual environment. You only need a Jupyter server to start.\n",
    "For details, please refer to [Installation Guide](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/README.md#-installation-guide).\n",
    "\n",
    "<img referrerpolicy=\"no-referrer-when-downgrade\" src=\"https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/catvton/catvton.ipynb\" />\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "3b91d041-f139-47c7-bca8-805249648b6f",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a1f0cac2-0661-4974-a5c6-34d0bed7c92c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import platform\n",
    "\n",
    "%pip install -q \"openvino>=2024.4\" \"nncf>=2.13.0\"\n",
    "%pip install -q \"torch==2.8\" \"diffusers>=0.29.1\" torchvision opencv_python --extra-index-url https://download.pytorch.org/whl/cpu\n",
    "%pip install -q fvcore \"pillow\" \"tqdm\" \"gradio>=4.36\" \"omegaconf==2.4.0.dev3\" av pycocotools cloudpickle scipy accelerate \"transformers==4.53.3\"\n",
    "\n",
    "if platform.system() == \"Darwin\":\n",
    "    %pip install -q \"numpy<2.0.0\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14f62ef8-89d1-4ddc-9058-aa09fcc763fb",
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "from pathlib import Path\n",
    "\n",
    "if not Path(\"notebook_utils.py\").exists():\n",
    "    r = requests.get(\n",
    "        url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py\",\n",
    "    )\n",
    "    open(\"notebook_utils.py\", \"w\").write(r.text)\n",
    "\n",
    "if not Path(\"cmd_helper.py\").exists():\n",
    "    r = requests.get(\n",
    "        url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/cmd_helper.py\",\n",
    "    )\n",
    "    open(\"cmd_helper.py\", \"w\").write(r.text)\n",
    "\n",
    "# Read more about telemetry collection at https://github.com/openvinotoolkit/openvino_notebooks?tab=readme-ov-file#-telemetry\n",
    "from notebook_utils import collect_telemetry\n",
    "\n",
    "collect_telemetry(\"catvton.ipynb\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3e0d059c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from cmd_helper import clone_repo\n",
    "\n",
    "\n",
    "clone_repo(\"https://github.com/Zheng-Chong/CatVTON.git\", \"3b795364a4d2f3b5adb365f39cdea376d20bc53c\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "a18a61d2-910a-478d-9901-703de65377f3",
   "metadata": {},
   "source": [
    "### Convert the model to OpenVINO IR\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    " \n",
    "\n",
    "OpenVINO supports PyTorch models via conversion to OpenVINO Intermediate Representation (IR). [OpenVINO model conversion API](https://docs.openvino.ai/2024/openvino-workflow/model-preparation.html#convert-a-model-with-python-convert-model) should be used for these purposes. `ov.convert_model` function accepts original PyTorch model instance and example input for tracing and returns `ov.Model` representing this model in OpenVINO framework. Converted model can be used for saving on disk using `ov.save_model` function or directly loading on device using `core.complie_model`.\n",
    "\n",
    "`ov_catvton_helper.py` script contains helper function for models downloading and models conversion, please check its content if you interested in conversion details.\n",
    "\n",
    "To download checkpoints and load models, just call the helper function `download_models`. It takes care about it.\n",
    "Functions `convert_pipeline_models` and `convert_automasker_models` will convert models from pipeline and `automasker` in OpenVINO format.\n",
    "\n",
    "The original pipeline contains VAE encoder and decoder and UNET.\n",
    "![CatVTON-overview](https://github.com/user-attachments/assets/e35c8dab-1c54-47b1-a73b-2a62e6cdca7c)\n",
    "\n",
    "The `automasker` contains `DensePose` with `detectron2.GeneralizedRCNN` model and `SCHP` (`LIP` and `ATR` version).\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "228e4320-3b0b-45e5-a1ad-bb09878fe239",
   "metadata": {},
   "outputs": [],
   "source": [
    "from ov_catvton_helper import download_models, convert_pipeline_models, convert_automasker_models\n",
    "\n",
    "pipeline, mask_processor, automasker = download_models()\n",
    "vae_scaling_factor = pipeline.vae.config.scaling_factor\n",
    "convert_pipeline_models(pipeline)\n",
    "convert_automasker_models(automasker)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "cef1940a-9830-49e3-a625-58adc3f13fdf",
   "metadata": {},
   "source": [
    "## Compiling models\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    "Select device from dropdown list for running inference using OpenVINO."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0629837a-25ec-47f9-8fd9-f87fc87fc1f5",
   "metadata": {},
   "outputs": [],
   "source": [
    "import openvino as ov\n",
    "\n",
    "from notebook_utils import device_widget\n",
    "\n",
    "\n",
    "core = ov.Core()\n",
    "\n",
    "device = device_widget()\n",
    "\n",
    "device"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "fdd405d5-5c57-4da5-b258-d458805a3afc",
   "metadata": {},
   "source": [
    "`get_compiled_pipeline` and `get_compiled_automasker`  functions defined in `ov_catvton_helper.py` provides convenient way for getting the pipeline and the `automasker` with compiled ov-models that are compatible with the original interface. It accepts the original pipeline and `automasker`, inference device and directories with converted models as arguments. Under the hood we create callable wrapper classes for compiled models to allow interaction with original pipelines. Note that all of wrapper classes return `torch.Tensor`s instead of `np.array`s. And then insert wrappers instances in the pipeline. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8612d4be-e0cf-4249-881e-5270cc33ef28",
   "metadata": {},
   "outputs": [],
   "source": [
    "from ov_catvton_helper import (\n",
    "    get_compiled_pipeline,\n",
    "    get_compiled_automasker,\n",
    "    VAE_ENCODER_PATH,\n",
    "    VAE_DECODER_PATH,\n",
    "    UNET_PATH,\n",
    "    DENSEPOSE_PROCESSOR_PATH,\n",
    "    SCHP_PROCESSOR_ATR,\n",
    "    SCHP_PROCESSOR_LIP,\n",
    ")\n",
    "\n",
    "pipeline = get_compiled_pipeline(pipeline, core, device, VAE_ENCODER_PATH, VAE_DECODER_PATH, UNET_PATH, vae_scaling_factor)\n",
    "automasker = get_compiled_automasker(automasker, core, device, DENSEPOSE_PROCESSOR_PATH, SCHP_PROCESSOR_ATR, SCHP_PROCESSOR_LIP)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "36cdac0a",
   "metadata": {},
   "source": [
    "## Optimize model using NNCF Post-Training Quantization API\n",
    "\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    "[NNCF](https://github.com/openvinotoolkit/nncf/) provides a suite of advanced algorithms for Neural Networks inference optimization in OpenVINO with minimal accuracy drop. We will use 8-bit quantization in post-training mode (without the fine-tuning pipeline) for the UNet model, and 4-bit weight compression for the remaining models.\n",
    "\n",
    "> **NOTE**: Quantization is time and memory consuming operation. Running quantization code below may take some time. You can disable it using widget below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e0dcadfd",
   "metadata": {},
   "outputs": [],
   "source": [
    "from notebook_utils import quantization_widget\n",
    "\n",
    "to_quantize = quantization_widget()\n",
    "\n",
    "to_quantize"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "fb08228d",
   "metadata": {},
   "source": [
    "Let's load `skip magic` extension to skip quantization if `to_quantize` is not selected"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1b307bdd",
   "metadata": {},
   "outputs": [],
   "source": [
    "is_optimized_pipe_available = False\n",
    "\n",
    "# Fetch skip_kernel_extension module\n",
    "if not Path(\"skip_kernel_extension.py\").exists():\n",
    "    r = requests.get(\n",
    "        url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/skip_kernel_extension.py\",\n",
    "    )\n",
    "    open(\"skip_kernel_extension.py\", \"w\").write(r.text)\n",
    "\n",
    "%load_ext skip_kernel_extension"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "4c26443b",
   "metadata": {},
   "source": [
    "### Run Post-Training Quantization\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    "The optimization process contains the following steps:\n",
    "\n",
    "1. Create a Dataset for quantization.\n",
    "2. Run `nncf.quantize` for getting an optimized model.\n",
    "3. Serialize an OpenVINO IR model, using the `openvino.save_model` function."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "9ed02ed4",
   "metadata": {},
   "source": [
    "We use a couple of images from the original repository as calibration data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "51b20f31",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%skip not $to_quantize.value\n",
    "\n",
    "from pathlib import Path\n",
    "from catvton_quantization_helper import collect_calibration_data, UNET_INT8_PATH\n",
    "\n",
    "dataset = [\n",
    "    (\n",
    "        Path(\"CatVTON/resource/demo/example/person/men/model_5.png\"),\n",
    "        Path(\"CatVTON/resource/demo/example/condition/upper/24083449_54173465_2048.jpg\"),\n",
    "    ),\n",
    "    (\n",
    "        Path(\"CatVTON/resource/demo/example/person/women/2-model_4.png\"),\n",
    "        Path(\"CatVTON/resource/demo/example/condition/overall/21744571_51588794_1000.jpg\"),\n",
    "    ),\n",
    "]\n",
    "\n",
    "if not UNET_INT8_PATH.exists():\n",
    "    subset_size = 100\n",
    "    calibration_data = collect_calibration_data(pipeline, automasker, mask_processor, dataset, subset_size)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f64b96e4",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%skip not $to_quantize.value\n",
    "\n",
    "import gc\n",
    "import nncf\n",
    "from ov_catvton_helper import UNET_PATH\n",
    "\n",
    "# cleanup before quantization to free memory\n",
    "del pipeline\n",
    "del automasker\n",
    "gc.collect()\n",
    "\n",
    "\n",
    "if not UNET_INT8_PATH.exists():\n",
    "    unet = core.read_model(UNET_PATH)\n",
    "    quantized_model = nncf.quantize(\n",
    "        model=unet,\n",
    "        calibration_dataset=nncf.Dataset(calibration_data),\n",
    "        subset_size=subset_size,\n",
    "        model_type=nncf.ModelType.TRANSFORMER,\n",
    "    )\n",
    "    ov.save_model(quantized_model, UNET_INT8_PATH)\n",
    "    del quantized_model\n",
    "    gc.collect()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7a8fff93",
   "metadata": {},
   "source": [
    "### Run Weights Compression\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    "Quantizing of the remaining components of the pipeline does not significantly improve inference performance but can lead to a substantial degradation of accuracy. The weight compression will be applied to footprint reduction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8aa19a12",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%skip not $to_quantize.value\n",
    "\n",
    "from catvton_quantization_helper import compress_models\n",
    "\n",
    "compress_models(core)\n",
    "\n",
    "is_optimized_pipe_available = True"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "cf6b22b4",
   "metadata": {},
   "source": [
    "### Compare model file sizes\n",
    "[back to top ⬆️](#Table-of-contents:)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "51e9aed8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "vae_encoder compression rate: 2.011\n",
      "vae_decoder compression rate: 2.007\n",
      "unet compression rate: 1.995\n",
      "densepose_processor compression rate: 2.019\n",
      "schp_processor_atr compression rate: 1.993\n",
      "schp_processor_lip compression rate: 1.993\n"
     ]
    }
   ],
   "source": [
    "%%skip not $to_quantize.value\n",
    "from catvton_quantization_helper import compare_models_size\n",
    "\n",
    "compare_models_size()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "4f696671-3958-48d7-be83-e7b1b225cec7",
   "metadata": {},
   "source": [
    "## Interactive inference\n",
    "[back to top ⬆️](#Table-of-contents:)\n",
    "\n",
    "Please select below whether you would like to use the quantized models to launch the interactive demo."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "706942f0",
   "metadata": {},
   "outputs": [],
   "source": [
    "from ov_catvton_helper import get_pipeline_selection_option\n",
    "\n",
    "use_quantized_models = get_pipeline_selection_option(is_optimized_pipe_available)\n",
    "\n",
    "use_quantized_models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "317995b2-f331-413c-99cf-ae7af6a87f94",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from gradio_helper import make_demo\n",
    "\n",
    "from catvton_quantization_helper import (\n",
    "    VAE_ENCODER_INT4_PATH,\n",
    "    VAE_DECODER_INT4_PATH,\n",
    "    DENSEPOSE_PROCESSOR_INT4_PATH,\n",
    "    SCHP_PROCESSOR_ATR_INT4,\n",
    "    SCHP_PROCESSOR_LIP_INT4,\n",
    "    UNET_INT8_PATH,\n",
    ")\n",
    "\n",
    "pipeline, mask_processor, automasker = download_models()\n",
    "if use_quantized_models.value:\n",
    "    pipeline = get_compiled_pipeline(pipeline, core, device, VAE_ENCODER_INT4_PATH, VAE_DECODER_INT4_PATH, UNET_INT8_PATH, vae_scaling_factor)\n",
    "    automasker = get_compiled_automasker(automasker, core, device, DENSEPOSE_PROCESSOR_INT4_PATH, SCHP_PROCESSOR_ATR_INT4, SCHP_PROCESSOR_LIP_INT4)\n",
    "else:\n",
    "    pipeline = get_compiled_pipeline(pipeline, core, device, VAE_ENCODER_PATH, VAE_DECODER_PATH, UNET_PATH, vae_scaling_factor)\n",
    "    automasker = get_compiled_automasker(automasker, core, device, DENSEPOSE_PROCESSOR_PATH, SCHP_PROCESSOR_ATR, SCHP_PROCESSOR_LIP)\n",
    "\n",
    "output_dir = \"output\"\n",
    "demo = make_demo(pipeline, mask_processor, automasker, output_dir)\n",
    "try:\n",
    "    demo.launch(debug=True)\n",
    "except Exception:\n",
    "    demo.launch(debug=True, share=True)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  },
  "openvino_notebooks": {
   "imageUrl": "https://github.com/user-attachments/assets/55319c63-f01c-4591-ac1e-3bb4e57dda35",
   "tags": {
    "categories": [
     "Model Demos",
     "AI Trends"
    ],
    "libraries": [],
    "other": [],
    "tasks": [
     "Image-to-Image"
    ]
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
